Tokenization and Morphological Analysis for Malagasy
نویسندگان
چکیده
The authors present a tokenizer and nite-state morphological analyzer [Beesley and Karttunen 2003] for Malagasy, based primarily on the discussion of Malagasy morphology in Keenan and Polinsky [1998] and Randriamasimanana [1986]. Words in Malagasy are built from roots by means of a variety of morphological operations such as compounding, afxation and reduplication. The authors analyze productive patterns of nominal and verbal morphology, and describe genitive compounding and sufxation for nouns and various derivational processes involving compounding and afxation for verbs. This work offers a computational analysis of Malagasy morphology, and forms the basis of a computational grammar and lexicon of Malagasy within the framework of the PARGRAM project.
منابع مشابه
Language Independent Morphological Analysis
This paper proposes a framework of language independent morphological analysis and mainly concentrate on tokenization, the first process of morphological analysis. Although tokenization is usually not regarded as a difficult task in most segmented languages such as English, there are a number of problems in achieving precise treatment of lexical entries. We first introduce the concept of morpho...
متن کاملA Two-level Morphology of Malagasy
We present a two-level model of Malagasy nominal and verbal morphology (Beesley and Karttunen, 2003), based primarily on the discussion of Malagasy morphology in Keenan and Polinsky (1998) and Randriamasimanana (1986). Words in Malagasy are built from roots by means of a variety of morphological operations such as affixation and reduplication. The present paper analyzes productive patterns of n...
متن کاملConsistent and Flexible Integration of Morphological Annotation in the Arabic Treebank
Treebank Annotation Issue: Multiple Levels of Annotation • Annotation not on the source text, but more abstract representation • How to maintain annotation consistency and relation between different levels? • How to make available the multiple levels of representation for the user? Arabic Treebank as a case study: • Mapping between two levels of annotation: • Morphological analysis of source te...
متن کاملTokenizing an Arabic Script Language
In any natural language processing project, the input text needs to undergo tokenization before morphological analysis or parsing. For Arabic script languages the tokenization process faces more problems and it plays a more crucial role in natural language processing (NLP) systems for Arabic script languages. In this work we elaborate on some of these problems and present solutions for these. T...
متن کاملArabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop
We present an approach to using a morphological analyzer for tokenizing and morphologically tagging (including partof-speech tagging) Arabic words in one process. We learn classifiers for individual morphological features, as well as ways of using these classifiers to choose among entries from the output of the analyzer. We obtain accuracy rates on all tasks in the
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJCLCLP
دوره 11 شماره
صفحات -
تاریخ انتشار 2006